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Abstract 

In this paper, we establish the well-posedness of the generalized 
moment problems recently studied by Byrnes-Georgiou-Lindquist and 
coworkers, and by Ferrante-Pavon-Ramponi. We then apply these 
continuity results to prove almost sure convergence of a sequence of 
high-resolution spectral estimators indexed by the sample size. 
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1 Introduction 



Consider a linear, time invariant system 



x(t + 1) = Ax{t) + By(t) 



AeC 



nxn 



B G C 



(1) 



with transfer function 



G(z) = (2/ - Ay 1 B 1 



(2) 



where A is a stability matrix, B is full column rank, and (A, B) is a reach- 
able pair. Suppose that the system is fed with a m- dimensional, zero-mean, 
wide-sense stationary process y having spectrum $. The asymptotic state 
covariance E of the system (TlJ) satisfies: 



Here and in the following, G*(z) = G T (z~ l ), and integration takes place 
over the unit circle with respect to the normalized Lebesgue measure d$/27r. 
Let S+ xm (T) be the family of bounded, coercive, C mxm -valued spectral 
density functions on the unit circle. Hence, $ G S+ xm (T) if and only if 
g S™ xm (T). Given a Hermitian and positive-definite nxn matrix E, 
consider the problem of finding $ G <S+ xm (T) that satisfies i.e., that 
is compatible with £. This is a particular case of a moment problem. In 
the last ten years, much research has been produced, mainly by the Byrnes- 
Georgiou-Lindquist school, on generalized moment problems [3], [7], [I], [§], 
|lUj . and analytic interpolation with complexity constraint [I], and their ap- 
plications to spectral estimation [2], [12], [15] and robust control [TTj . It is 
worth recalling that two fundamental problems of control theory, namely the 
covariance extension problem and the Nevanlinna-Pick interpolation problem 
of robust control, can be recast in this form [TO] . 

Equation ([3]), where the unknown is is also a typical example of an 
inverse problem. Recall that a problem is said to be well posed, in the sense of 
Hadamard, if it admits a solution, such a solution is unique, and the solution 
depends continuously on the data. Inverse problems are typically not well 
posed. In our case, there may well be no solution $, and when a solution 
exists, there may be (infinitely) many. It was shown in [S], that the set of 
solutions is nonempty if and only if there exists H G C mxn such that 




(3) 



E - AHA* 



BH + H*B*. 



(4) 
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When Q is feasible with £ > 0, there are infinitely many solutions $ to pi). 
To select a particular solution it is natural to introduce an optimality crite- 
rion. For control applications, however, it is desirable that such a solution 
be of limited complexity. It should namely be rational and with an a priori 
bound on its MacMillan degree. One of the great accomplishments of the 
Byrnes-Georgiou-Lindquist approach is having shown that the minimization 
of certain entropy-like junctionals leads to solutions that satisfy this require- 
ment. In [8], Georgiou provided an explicit expression for the spectrum $ 
that exhibits maximum entropy rate among the solutions of 

Suppose now that some a priori information about $ is available in the 
form of a spectrum \1/ e <S™ xm (T). Given G, E, and we now seek a 
spectrum <3>, which is closest to ^ in a certain metric, among the solutions 
of (|3]). Paper [10] deals with such an optimization problem in the case when 
y is a scalar process. The criterion there is the Kullback-Leibler pseudo- 
distance from \l/ to $. A drawback of this approach is that it does not 
seem to generalize to the multivariable case. This motivated us to provide a 
suitable extension of the so-called Hellinger distance with respect to which 
the multivariable version of the problem is solvable (see [6] and [15]). 

The main result of this paper is contained in Section |3j We show there 
that, under the feasibility assumption, the solution to the spectrum approx- 
imation problem with respect to both the scalar Kullback-Leibler pseudo- 
distance and the multivariable Hellinger distance depends continuously on 
S, thereby proving that these problems are well-posed. In Section [4] we deal 
with the case when only an estimate £ of £ is available. By applying the 
continuity results of Section |3j we prove a consistency result for the solutions 
to both approximation problems. 

2 Spectrum approximation problems 

In this section, we collect some background material on spectrum approxi- 
mation problems. The reader is referred to [8], [TU|, [B] and [15] for a more 
detailed treatment. 

2.1 Feasibility of the moment problem 

Let H(n) be the space of Hermitian n x n matrices, and C(T;H(m)) the 
space of H(m)-valued continuous functions defined on the unit circle. Let 
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the operator T : C(T; H(m)) — > H(n) be defined as follows: 

r($) := I G&G*. (5) 



Consider now the range of the operator T (as a vector space over the reals). 
We have the following result (see [T5]). 

Proposition 2.1 

1. Lei E = E* > 0. T/ie following are equivalent: 

• There exists H G C mxn which solves Q). 

• T/iere erects $ G <S™ xm (T) suc/t tfaa* / G&G* = E. 

• T/iere exists $ G C(T;H(w)) ; $ > snc/i £/ia£ T($) = E. 

2. Let E = E* (not necessarily definite). There exists H G C mxn that 
solves if and only i/Ee Range V. 

3. X G Range T x z/ and onfy if G*(eP)XG(eP) = Vi? G [0,2tt]. 
We define 

P r := {S G RangeT | E > 0}. (6) 
In view of Proposition 2.1, for each E G Pr problem ^ is feasible. 



2.2 Scalar approximation in the Kullback-Leibler pseudo- 
distance 

In [TO], the Kullback-Leibler pseudo-distance for spectral densities in iS+ xl (T) 
was introduced: 



B(¥||$) = y tflog-. (7) 

As is well known, the corresponding quantity for probability densities orig- 
inates in hypothesis testing, where it represents the mean information per 
observation for discrimination of an underlying probability density from an- 
other [13]. The approximation problem goes as follows: 

Problem 2.2 Given E G P r and * G S+ xl (T) ; find $f L so/wes 

minimize D(\&||<&) 

over |$G^ xl (T)| Jg®G* = z\. ® 
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Note that, following [TO], and differently from optimization problems that are 
usual in the probability setting, we minimize ^ with respect to the second 
argument. The remarkable advantage of this approach is that, differently 
from optimization with respect to the first argument, it will yield a rational 
solution whenever \1/ is rational. Let 

C KL := {A G | G*AG > 0,W & G T}. 

For a given A G C KL , consider the Lagrangian functional 

A) = D(\Er||<&) + ^A, J G&G* - , (9) 

where (A,B) := tr AB denotes the scalar product between the Hermitian 
matrices A and B. Observe that the term J between brackets be- 

longs to Range T by definition, while S belongs to Range T by the feasibility 
assumption. Hence, it is natural to restrict A to Range T, or, which is the 
same, to 

C^ L ■= C KL n RangeT. 

The functional ([9J is strictly convex on iS^_ xl (T). Hence, its unconstrained 
minimization with respect to $ can be pursued imposing that its derivative 
in an arbitrary direction 5$ is zero. This yields the form for the optimal 
spectrum: 

f Ki = — ^— . (10) 
G*AG 1 ' 

As noted previously, inasmuch as ^ is rational is also rational, and with 

MacMillan degree less than or equal to 2n + deg Now if A G £^ L is such 
that 

G 7^ G* = E, (11) 



G*AG 

that is, if A is such that the corresponding optimal spectrum satisfies the 



constraint, then (10) is the unique solution to the constrained approximation 
problem (2.2). Finding such A is the objective of the the dual problem, which 
is readily seen [10J to be equivalent to 

minimize {Jf L (A) | A G C^ L } (12) 



where 



J^ L (A) = -J ^logG*AG + trA£. (13) 
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This is also a convex optimization problem. Existence of a minimum is a 
highly nontrivial issue. Such existence was proved in [10] resorting to a 
profound topological result, and in |5j by a less abstract argument. 

Theorem 2.3 The strictly convex functional J^ L has a unique minimum 
point in C^ L ■ 



The minimum point of Theorem |2.3| provides the optimal solution to the 



primal problem 2.2 via (10). Differently from the primal problem, whose do- 
main iS+ xl (T) is infinite-dimensional, the dual problem is finite-dimensional, 
hence the minimization of J^ L can be accomplished with iterative numeri- 
cal methods. The numerical minimization of J^ L is not, however, a simple 
problem, because both the functional and its gradient are unbounded on C^ L 
(which is unbounded itself). Moreover, reparametrization of C^ L may lead 
to loss of convexity (see [ID] and references therein). An alternative approach 
to this problem was proposed in [T4] . 



2.3 Multivariable approximation in the Hellinger dis- 
tance 

In [BJ the Hellinger distance between two spectral densisties $, \l/ G 5+ xl (T) 
was introduced: 

21 1/2 



(14) 



As it happens for the Kullback-Leibler case, its counterpart for probabil- 
ity densities is well-known in mathematical statistics. Differently from the 



Kullback-Leibler case, this is a bona fide distance (note that (14) is nothing 
more that the L 2 distance between the square roots of $ and and that 
the square roots are particular instances of spectral factors). A variational 
analysis similar to the one we have just seen is possible and leads to similar 



results. Let us focus directly on the multivariable extension of (14) that was 
developed in [B]. Given $, \l/ 6 <S!j 

d H {$,*) :=inf {I 

w*w*. 



" xm (T), we define the following quantity 
- W®\\ 2 

w*w* 



Wv,W* e L? xm , 



(15) 



Observe that dn{^, ^) is simply the L 2 distance between the sets of all the 
square spectral factors of $ and \I/ respectively. We have the following result 
(see [6]). 
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Theorem 2.4 The following facts hold true: 
1. djj is a bona fide distance function. 



2. c£ff($, \P) coincides with (IJ.) when $ and \l/ are scalar. 



3. The infimum in (15) is indeed a minimum. 



4- For any square spectral factor Wq, of ^ , we have: 

tf) = inf {\\Wy - W*|| 2 : ^™ xm , W*W* = $} . 

Fact 4 says that, if we fix a spectral factor of one spectrum and minimize 
only among spectral factors of the other, the result is the same. Given 
\I/ G S+ xm (T) (and G(z) n x m), we pose a minimization problem similar to 
Problem E2 

Problem 2.5 Given Y>eP v andm e S™ xm (T), find $f ffcat so/wes 
minimize dn{&,^) 

(16) 



over 



In view of facts 3 and 4 in Theorem 2.4, once a spectral factor of \I/ is fixed, 



the same problem 2.5 can be reformulated in terms of a minimization with 



respect to spectral factors o/$: 

Gzwen EePr and a spectral factor of^e <S+ xm (T) ; find W$ that solves 
minimize tr / (W$ - W 7 *) (W 7 ^ - W 7 *)* 

r y i (17) 

over <^ W $ G L™ xm | / GWW^G* = E J- . 



Consider the Lagrangian functional 

H(W$, A) = tr J (W* - Wv) (W* - W 7 *)* + /a, y GW^WIG* - E^ . 



(18) 



7 



For the same reason as before, we restrict the matrix A to Range T. The 
functional (18) is strictly convex, and its unconstrained minimization of (18) 
with respect to yields the following condition for the optimal spectral 
factor W^ 1 (see [6] for details): 

-W 9 + G*AGW* = 0. (19) 

In order to ensure that the corresponding spectrum is integrable over the 
unit circle, we now require a posteriori that A belongs to the set 

C H = {A G H(n) | / + G*AG > G T} 

or, which is the same, that it belongs to the set 

:= C H n RangeT. (20) 

Such restriction yields the following optimal spectral factor and spectrum: 

W? = (I + G'AG)- 1 W 9 , 

$f = W?W?* = (I + G*AG)- 1 ^(I + G*AG)-\ 
Now if A is such that 

J G {I + G*AG)- 1 ^(I + G*AG)- 1 G* = S, (22) 



in 


21 


(2.5 


• ] 



then $^ in (21) is the unique solution to the constrained approximation 



which can be shown to be equivalent to 

minimize {J^(A) | A G } (23) 

where 

Jf (A) = tr J(I + G*AG)-^ + tr AS. (24) 

Existence of a minimum is again a highly nontrivial issue. We have the 
following result (see [6]). 

Theorem 2.6 The strictly convex functional has a unique minimum 
point in C^f . 

The minimum point of Theorem 2.6| provides the optimal solution to the 
primal problem 2.5 via (21). It can be found by means of iterative numerical 
algorithms. The numerical minimization of is a highly nontrivial problem, 
for reasons similar to the ones concerning J^ L . In [15], we propose a matricial 
version of the Newton algorithm that avoids any reparametrization of Cjf , 
and proved its global convergence. 
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3 Well-posedness of the approximation prob- 
lems 



In this section, we show that both the dual problems (12) and (23) are 
well-posed, since their unique solution is continuous with respect to a small 
perturbation of E. The well-posedness of the respective primal problem then 
easily follows. All these continuity properties rely on the following basic 
result. 

Theorem 3.1 Let A be an open and convex subset of a finite- dimensional 
euclidean space V . Let f : A^M. be a strictly convex function, and suppose 
that a minimum point x of f exists. Then, for all e > 0, there exists 5 > 
such that, for each p G lR n , < 5, the function f p :A^R defined as 

f P ( x ) f( x ) ~ (P,%) 
admits an unique minimum point x p , and moreover 

1 1 "X"p X 1 1 "\ £" . 

(Note: f*{p) := —f p (x p ) is the Fenchel dual of f at p.) 

Proof. First, note that the minimum point x is unique, since / is strictly 
convex. Let e > 0, and let S(x,e) = {x + y \ \ \y\ \ = e} denote the sphere 
of radius e centered in x. Let moreover B(x,e) = {x + y \ \\y\\ < e} denote 
the open ball of radius e centered in x and B{x,e) = {x + y \ \\y\\ < e} its 
closure. Then B(x,e) = B(x,e) U S(x,e), B(x,e) and S(x,e) are compact, 
and S(x,e) is the boundary of B(x,e). Since / is continuous, it admits a 
minimum point x + y £ over S(x, e). Since x is the unique global minimum 
point of /, we must have m £ := f{x + y £ ) — f{x) > 0. Then, for \\y\\ =ewe 
have 

f(x + y)-f(x)>m £ . (25) 
Let now < 5 < m £ /e. For < 5 and \ \y\\ = e we have 

(P,y) < IMI \\y\\ <Se<m £ (26) 
where the first inequality stems from the Cauchy-Schwartz inequality. From 



(25) and (26), we get for \\y\\ = e 

f(x + y)- f(x) > (p, y) = (p, x + y) - (p, x) 
f p (x + y) > f p (x) 
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that is 



f P (x) > f P (x) 



for each x G S(x, e). 

Now, since / is strictly convex and hence continuous, f p is also strictly convex 
and continuous, and admits a minimum point x p over the compact set B(x, e). 
But it follows from the previous considerations that such minimum cannot 
belong to S(x,e). Hence, it must belong to the open ball B(x,e). As such, 
x p is also a local minimum of f p over A, but since f p is strictly convex, it is 
also the unique global minimum point. Summing up, for fixed e > 0, there 
exists 5 > such that, if \\p\\ < 6, then f p admits an unique minimum x p 
over A. It follows from the previous analysis that, for sufficiently small 5, x p 
belongs to B(x,e). This proves the theorem. □ 

3.1 Well-posedness of Kullback-Leibler approximation 



Consider the dual functional (13), and let us make its dependence upon E 
explicit: 

Jf L (A; E) = - J * log G*AG + tr AE. 
J^ L is a strictly convex functional over C^ L , which is an open and convex 



subset of the Euclidean space Range T. Due to Theorem (2.3), it does admit 
a minimum point 

Af L (E) = argminj£ L (A;E). 

A 

Let 5E be a perturbation of E. We have 

Jff" L (A; E + 5E) = - J * log G*AG + tr AE + tr A5E 
= jf L (A;E) + (5E,A). 



It follows from Theorem 3.1 where the role of <5E is played by — p, that for 
each e > there exists 5 > such that if ||£E|| F < 5, then J£ L (A; E + 5E) 
again admits a minimum point 

Af L (E + 5E) = argminJ^ L (A;E + 5E) (27) 

and the distance ||Af L (E + <*£) - Af L (E)|| F is less than e. The above 
observation implies well-posedness of the dual problem: 
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Corollary 3.2 The map 

T^A« L (T) 

is continuous from to C^ L ■ 

Consider now the primal problem. The variational analysis yielded the fol- 
lowing optimal solution, where the dependence upon T has been made ex- 
plicit: 

1ST \P 

$ KL (T) = 

We have the following result. 
Theorem 3.3 The map 



yKLl 
t 

is a continuous function from P-p to . 

Proof. Recall that A^ L (T) is the solution of the dual problem where the true 
asymptotic state variance is known, and let A^ L (T + ST) be the solution to 
the dual problem with respect to a perturbed covariance. Let $^ L (S) and 
$^" L (£ + ST) be the corresponding solutions to the primal problem. Then 

$« L (T + ST)-$« L (T)\\ C * * 



G* A* L {T + ST) G G* Af L (T) G 
1 1 



G* Af L (S + <JE) G G* A* L (S) G 

It is easily seen that for each 77 > we can choose e > such that if 
\\A« L (T + ST)-A« L (T)\\ F <e, then 

max |G*Af L (T + 5S)G - G*Af L (S)G| = 

= max |G T (e- j,? )(Af i (S + ST) - A^' L (T))G(e^)\ < V 



Finally, from the above observation, from Corollary 3.2, and from the 
continuity of the function - over IR + , it follows that for each \x > 0, there 
exists S > such that, for all \\5T\\ F < 6, ||$f L (S + 5T) - ^(T)^ < fi. 
□ 

Corollary 3.4 The problem 

argrnin£>(*||$) such that / G$G* = T 
is well-posed for E 6 Pr and for variations ST that belong to Range T. 
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3.2 Well-posedness of Hellinger approximation 



Consider the dual functional (24): 

jf (A; E) = tr J(I + G*AG)- 1 ^ + tr AS. 

is a strictly convex functional over C^, which is an open and convex 
subset of the Euclidean space Range T. Due to Theorem (2.3), it admits a 
minimum point 

Af(E) = argminj| f (A;E). 

A 

Let as before #E be a perturbation of E. Then 

Jf (A; E + <5E) = J* (A; E) + (SE, A) . 



Theorem 3^ implies the following 
Corollary 3.5 The map 

E ~ A? (S) 

continuous from Pr to 
The variational analysis yielded the optimal solution for the primal problem 
$f (E) = (/ + G* Af (E) G)- 1 ^/ + G* Af (E) G) -1 , (28) 



and considerations similar to those of theorem (3.3) lead to the following 
Theorem 3.6 The map 



is continuous from Pr to L 



mxm 
oo 



To prove Theorem 3.6 we exploit the following result established in [15] 
(Lemma 5.2): 

Lemma 3.7 Define Qa(z) = I+G*(z)AG(z). Consider a sequence A n G £p 
converging to A G ^Cf- JTien are we// defined and continuous on T and 
converge uniformly to Q^ 1 on T. 



Proof, (of Theorem 3.6.) Let Qa(z; E) = I + G*{z) Af (E) G{z). Apply 



Corollary |3.5| and Lemma 3/7 to establish the continuity of the map from Pr 
to L™ xm defined by E i— > Q^ 1 - The continuity of E i— > $f (E) follows from 
the continuity of matrix multiplication. □ 
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Corollary 3.8 The problem 



wgimnd H (<&, \&) such that J G&G* = S 
is well-posed, for S G Pr and for variations <5£ t/iat belong to Range T. 

4 Consistency 

So far we have shown that both the approximation problems admit an unique 
solution for all £ G Pr, and that the solution is continuous with respect to 
variations <5£ G Range T. The necessity of a restriction to Range T becomes 
crucial in the case when we only have an estimate £ of E. 
In line with the Byrnes-Georgiou-Lindquist theory, and following an estima- 
tion procedure we have sketched in [T5], we want to use the above theory to 
provide an estimate $ of the true spectrum of the process y. 
Let G(z) and \I/ be given. Suppose that we feed G{z) with a finite sequence 
of observations, say {yi, ...,yjsr} of the process. Observing the states of the 
system, say {x±, ...,xn}, we then compute a Hermitian and positive definite 
estimate £ of the asymptotic state covariance, such as 

1 N 

fe=l 

This is provably consistent, and also unbiased, for we have supposed from 
the beginning that y has zero mean. We seek an estimate $ of $ by solving 
an approximation problem with respect to G(z), and S. 
Since £ is not the true variance anymore, the constraint ^ may be not 
feasible. Hence, in order to find a solution $, we need to find a second 
estimate S, close to the first, such that Q is feasible with the covariance 
matrix S. A reasonable way to proceed is to let S be the projection of S 
onto Range T. Since orthogonal projectors from H(ra) to a subspace of H(n) 
are continuous functions, if S(xi, ...,a;Ar) is a consistent estimator of S, then 

5 is also a consistent estimator of E. 

The problem that may come up proceeding in this way is that the projection 
onto RangeT needs not be positive definite (that is, it may not belong to 
Pr); even if £ is. If this is the case, the correct procedure to estimate £ while 
preserving the structure of a state covariance compatible with G(z) is to find 
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E 6 ?r which is closest to S in a suitable distance. This is an optimization 
problem in itself. 



The continuity results of the preceding sections imply two strong consis- 
tency results. Let E(sei, Xjv) G Pr denote a consistent estimator of S. 
Let $^ L (S) be the solution to the Kullback-Leibler approximation problem 
with respect to the true asymptotic variance and $,f L (£(xi, ...,xn)) be the 
solution of the same problem with respect to the estimate. 



Corollary 4.1 If 

then 



lim xn) = E a.s., (29) 

N— >oo 



lim ||^(E(z 1 ,...,^))-$r(E)|| oo = a.s. 

N— >oo 

Proof. From the continuity of the map E i— > $^" L (E) we have that, excepting 
a set of zero probability, 

lim $f L (E(z a (w),...,a*H)) = $f L f Km £(^(0;), .., x N (u)j) = $f L (£) 
where the first limit is taken in Loo(T). fj 

As for the Hellinger multivariable approximation problem, let <J>^(E) be the 
solution with respect to the true asymptotic variance and $^(E(xi, ...,xn)) 
be the solution with respect to the estimate. Employing the very same tech- 



nique used for the proof of Corollary 4.1 it is easy to establish the following 



consistency result for the problem associated to the multivariable Hellinger 
distance. 



Corollary 4.2 If 

then 



lim E(xi, Xjst) — E a.s., 



lim ||$f(E(x 1 ,...,x iV ))-$f(E)|| oo = a.s. 

5 Conclusion 

In this paper, we have considered constrained spectrum approximation prob- 
lems with respect to both the Kullback-Leibler pseudo-distance (scalar case) 



14 



and the Hellinger distance (multivariable case). The range of the operator 
r : $ i— > J is the subspace of the Hermitian matrices that conveyes 

all the structure that is needed from a positive-definite matrix in order to be 
an asymptotic covariance matrix of the system with tranfer function G(z). 
As such, it is also a natural subspace to which the domains of the respective 
dual problems should be constrained. We have shown that the condition 
£ G Range T is not only necessary for the feasibility of the moment problem 
{<£> | J GQG* = £}, but also sufficient for the continuity of the respective 
solutions with respect to S. This fact implies well-posedness of both kinds of 
approximation problems, and implies the consistency of the respective solu- 
tions with respect to a consistent estimator £ of £, as long as it is restricted 
to RangeT. Similar results can be established along the same lines when 
employing any other (pseudo-)distance, as long as the functional form of the 
primal optimum depends continuously upon the Lagrange parameter A. 
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